FUNCTION ExtractNGrams(sequence, n): """ Extracts all contiguous sub-sequences of length n from a given sequence. Args: sequence (list): An ordered list of items (e.g., actions ['jump', 'dash', 'shoot']). n (int): The size of the N-gram to extract (e.g., 2 for bigrams). Returns: list: A list of all N-grams found in the sequence. Example: ExtractNGrams(['jump', 'dash', 'shoot'], 2) -> [('jump', 'dash'), ('dash', 'shoot')] """ ngrams = new List() // Iterate through the sequence, stopping when the remaining part // is shorter than the desired N-gram size 'n'. FOR i from 0 to length(sequence) - n: // Slice the sequence from the current index 'i' to 'i+n' // to create a sub-sequence of length 'n'. ngram = sequence[i : i+n] // Add the newly created N-gram to the list of results. append ngram to ngrams RETURN ngrams
FUNCTION train_logistic_regression(X, y, learning_rate, epochs, lambda): """ Trains a Multinomial Logistic Regression model using Gradient Descent. Args: X (matrix): Training data of shape (m_samples, n_features). y (vector): True labels, one-hot encoded, of shape (m_samples, K_classes). learning_rate (float): Step size for weight updates. epochs (int): Number of passes over the training data. lambda (float): L2 regularization strength. """ // 1. Initialize weights and biases with small random values or zeros. m, n = shape(X) K = number_of_classes W = new Matrix(n, K) of zeros b = new Vector(K) of zeros // 2. Loop for a number of epochs to iteratively improve the weights. FOR epoch FROM 1 TO epochs: // 3. Forward Pass: Make predictions with current weights. // a. Calculate linear scores (logits) for each class. logits = X @ W + b // b. Apply softmax to get probabilities. probabilities = softmax(logits) // 4. Gradient Calculation: Find the direction to update weights. // The gradient of cross-entropy loss has a simple form. error = probabilities - y // Calculate gradients for weights and biases, including the regularization term for weights. grad_W = (1/m) * (X.T @ error) + (lambda/m) * W grad_b = (1/m) * sum(error) // 5. Weight Update: Take a small step against the gradient. W = W - learning_rate * grad_W b = b - learning_rate * grad_b RETURN W, b
FUNCTION train_gradient_boosting(X, y, learning_rate, n_estimators): """ Trains a Gradient Boosting model with regression trees for classification. (This is a simplified illustration of the core XGBoost logic). Args: X (matrix): Training data. y (vector): True labels (e.g., as integers 0, 1, 2, 3). learning_rate (float): Shrinks the contribution of each tree. n_estimators (int): The total number of trees to build. """ // 1. Initialize the model with a simple base prediction. // For classification, this is often the log-odds of the base class frequency. F_0 = calculate_initial_prediction(y) ensemble_prediction = F_0 // 2. Loop for the number of estimators (trees) to build. FOR m FROM 1 TO n_estimators: // 3. Compute Pseudo-Residuals: // Calculate the error of the current ensemble. This is the negative // gradient of the loss function. For classification, this is (true_label - predicted_prob). current_probabilities = softmax(ensemble_prediction) pseudo_residuals = y - current_probabilities // 4. Train a new tree: // Fit a simple Regression Tree (h_m) to predict the pseudo-residuals. // This tree learns to correct the errors of the current ensemble. h_m = fit_regression_tree(X, pseudo_residuals) // 5. Update the ensemble: // Add the new tree's predictions, scaled by the learning rate, // to the overall ensemble prediction. ensemble_prediction = ensemble_prediction + learning_rate * h_m.predict(X) RETURN the final ensemble of trees
CLASS Pipeline: """ A simplified pseudocode representation of a scikit-learn Pipeline. Attributes: steps (list): A list of (name, transformer/estimator) tuples. e.g., [('scaler', StandardScaler), ('logreg', LogisticRegression)] """ FUNCTION fit(self, X_train, y_train): """ Fits all transformers and the final estimator sequentially. Args: X_train (matrix): The training feature data. y_train (vector): The training labels. """ // Start with the original training data. transformed_X = X_train // Loop through all steps EXCEPT the final one (the model). FOR each step (name, transformer) in self.steps[:-1]: // Fit the transformer on the current data and transform it. // The output becomes the input for the next step. transformed_X = transformer.fit_transform(transformed_X, y_train) // For the final step (the model), just call .fit(). final_model = self.steps[-1][1] final_model.fit(transformed_X, y_train) RETURN self FUNCTION predict(self, X_new): """ Transforms the new data and makes a prediction using the fitted model. Args: X_new (matrix): The new, unseen data to predict on. """ // Start with the new, unseen data. transformed_X = X_new // Loop through all steps EXCEPT the final one (the model). FOR each step (name, transformer) in self.steps[:-1]: // Use the ALREADY FITTED transformer to just .transform() the new data. // No new fitting occurs here, preventing data leakage. transformed_X = transformer.transform(transformed_X) // For the final step (the model), call .predict() on the transformed data. final_model = self.steps[-1][1] prediction = final_model.predict(transformed_X) RETURN prediction
# Code Snippet from the live application's `_update_prediction` method: # ... (inside a loop that runs every 2 seconds) # 1. Engineer features from the live, growing list of key events features = self._realtime_feature_engineering(current_events) # 2. Use the loaded model to predict probabilities for each class # The [0] is to get the results for our single sample probabilities = self.model.predict_proba(features)[0] # 3. Loop through the results and update the UI labels for i, boss_name in enumerate(self.label_encoder.classes_): # Get the probability for the current boss probability = probabilities[i] # Format the string to show the name and percentage display_text = f"{boss_name}: ({probability:.0%})" # Update the corresponding label in the tkinter UI self.prediction_labels[boss_name].set(display_text)